Update site package behavior#232
Conversation
…SOFTWARE_PREFIX, rather than only supporting the default under host_injections
| -- If EESSI_SITE_SOFTWARE_PREFIX is defined, replace /cvmfs/software.eessi.io (or more generally EESSI_CVMFS_REPO) | ||
| -- by that prefix. This ensures that the directory still contains the os/vendor/arch/micro-arch/accelerator etc | ||
| -- If it is not defined, default to a site installation prefix under host_injections | ||
| site_prefix = os.getenv("EESSI_SITE_SOFTWARE_PREFIX") |
There was a problem hiding this comment.
I don't love this approach, there is value in having the path in a fully defined location.
There was a problem hiding this comment.
I thought about it more, and I am coming around. I was a bit stuck on having the fixed path because we require a fixed path to be able to do MPI injection. However, these are actually separate issues and this change doesn't affect that.
There was a problem hiding this comment.
This will require some documentation updates, and some additional checks in EESSI-extend. We should also update the dev.eessi.io workflow to use this.
Issues for all these are enough for now, but they should be addressed.
There was a problem hiding this comment.
Caspar and I discussed this quite a bit yesterday. In some way I also still prefer to have everything in a single and fixed location, but I also see Caspar's point, and being able to do the software installations elsewhere does make sense to me. I still have some concerns, e.g. about where to put the Lmod site hooks: Caspar wanted to keep them under host_injections, as they (may) also affect EESSI modules, and though that makes sense, it does mean that you may have two locations with this 2023.06/software/x86_64/amd/zen3 trees (one for locally built software, one for Lmod hooks). Same question for the site's Lmod cache: for that one it does definitely make sense to store them near the software, meaning you will end up with two of those .lmod directories in different places. Personally, I still don't really like that, it can easily confuse users/admins.
The current situation, with just a single host_injection directory, also has some drawbacks, e.g. in case you want to do site installations on a local CVMFS repo, while having MPI/GPU libraries on a local disk. You can do it by doing some symlink trickery, but that's not ideal either.
Anyway, since the default behavior won't change, I was okay with adding the possibility for overriding the software installation prefix.
There was a problem hiding this comment.
Happy to hear you've come around - I do feel that at least having the option as a site to configure it is something we should offer. Whether a site uses it, or prefers redirecting the host-injections symlink (or making e.g. /cvmfs/software.eessi.io/host_injections/2025.06 a symlink, as @ocaisa suggested on chat) are then just a choice that every site has.
I'd personally like to use the environment variable. And I'd probably set it in an internal lmod hook, so that it's set at the moment users load the EESSI module - that way users can't unset it either.
Companion PR for EESSI-extend is here #235
I'll make issues for the rest. I think we should do the documentation updates once we've actually taken the whole new functionality for practical test - I'm hoping the idea is more clear after the webinar :)
There was a problem hiding this comment.
We should also update the dev.eessi.io workflow to use this.
And yes, if this works well, I think it's nice to make dev.eessi.io use this, as it is essentially also 'building on top' of EESSI.
There was a problem hiding this comment.
Docs issue: EESSI/docs#768
software-layer-scripts issue to update dev.eessi.io workflow #236
There was a problem hiding this comment.
Minimal docs update here EESSI/docs#769 (we should still have more extensive docs on how to build on top of EESSI as a site, but that's what EESSI/docs#768 is for).
| -- Make sure the EESSI cache is found, this is specified in the lmodrc.lua in the eessi_software_path | ||
| prepend_path("LMOD_RC", pathJoin(eessi_software_path, ".lmod", "lmodrc.lua")) | ||
| eessiDebug("Adding " .. pathJoin(eessi_software_path, ".lmod", "lmodrc.lua") .. " to LMOD_RC") | ||
| -- Make sure that a cache for site installations can also be found |
There was a problem hiding this comment.
How to generate the cache also needs an update, but I need to check that works first. We need to let Lmod know that the module path requires a gateway module, then the hierarchy can be represented in the cache.
It's a separate issue, this just reminds me that I need to look into it.
There was a problem hiding this comment.
Do we currently even support caches for local installation? Caspar and I looked into that a bit yesterday, but we couldn't find anything. We plan to cover this in the tutorial, but we should indeed also add documentation for this. With Caspar's proposed change, sites can easily call the create_lmodrc.py from our repo to generate the lmodrc.lua file for their local stack (the EESSI module will then add it to $LMOD_RC), and then it should just be a matter of calling for instance https://github.com/EESSI/filesystem-layer/blob/main/scripts/update_lmod_caches.sh to actually generate the cache files.
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2 |
|
New job on instance
|
|
New job on instance
|
|
CI is complaining: the module now sets LMOD_RC to a different value than the init script, so we should probably also change the latter in the same way. |
|
Tried to fix it in casparvl#4. |
take EESSI_SITE_SOFTWARE_PREFIX into account, use site's Lmod RC file
… they are always set (by sourceing early), but it's bad style / practice
…'s accelerator path
…he dir exists - just like in the module
| -- by that prefix to get the site accelerator path. This ensures that the directory still contains the | ||
| -- os/vendor/arch/micro-arch/accelerator etc. If it is not defined, default to a site installation prefix under | ||
| -- host_injections | ||
| if site_prefix then |
There was a problem hiding this comment.
I forgot this earlier, but we should implement the same logic for the site accelerator path.
| else | ||
| eessi_module_path_site_accel = string.gsub(eessi_module_path_accel, "versions", "host_injections") | ||
| end | ||
| if isDir(eessi_module_path_site_accel) then |
There was a problem hiding this comment.
I've changed the condition here: the original would effectively test "Is there an accelerator path in the eessi prefix" and then add the accelerator path in the site prefix. That's totally weird, there's no reason why the addition of an accelerator path for a site should depend on the existence of that same accelerator path in EESSI. I don't see anything wrong with a site extending EESSI by e.g. building software for 2025.06 for a CUDA CC 11.0 target (which we don't support upstream).
Note that I'm not 100% sure that this (e.g. adding a CC11.0 target build) is totally supported with only this change (e.g. we still need to check in line 233 that eessi_module_path_accel is defined in order to construct the right site accelerator path - and that might not be defined if the dir doesn't exist, I'm not sure), but it's a step in the right direction.
| show_msg "Using ${EESSI_MODULEPATH_ACCEL} as additional directory (for accelerators) to be added to MODULEPATH." | ||
| export EESSI_SITE_MODULEPATH_ACCEL=${EESSI_SITE_ACCEL_SOFTWARE_PATH}/${EESSI_ACCELERATOR_TARGET}/${EESSI_MODULE_SUBDIR} | ||
| fi | ||
| if [ -d "${EESSI_SITE_ACCEL_SOFTWARE_PATH}/${EESSI_ACCELERATOR_TARGET}/${EESSI_MODULE_SUBDIR}" ]; then |
There was a problem hiding this comment.
Change this in sync with the change in 2023.06.lua, which now conditionally adds the site accerator path if that accelerator path exists - ie this check:
if isDir(eessi_module_path_site_accel) then
…SSI_SITE_MODULEPATH_ACCEL if that dir exists, so precreate it in the check
|
Testing once more, with the module: Then, with the Strange, that still doesn't work. In fact, it doesn't even do a versions => host_injections replacement for the site extension dir. Why?? |
|
This is just wrong, how can that be? |
|
The behavior for the |
…uld I just create the /opt/eessi default target?
|
It doesn't seem to be as simple as trying to create the directory: This seems to be a file (maybe broken symlink?) instead of a directory. |
|
I'll wait for CI to finish, but the mkdir's probably need |
…m to not get EESSI_SITE_MODULEPATH_ACCEL set
This is preparatory work for better supporting building on top of EESSI.
Specifically, this change allows sites to customize where their site installation path is through
EESSI_SITE_SOFTWARE_PREFIX. It also makes sure that any caches for the site-installs are picked up by adding thelmodrcfile to theLDOD_RCsearch path. Thus, as long as sites generate anlmodrc.lua(e.g. using thecreate_lmodrc.pyfrom thesoftware-layer-scriptsrepo) and run the appropriate commands to generate a cache in the corresponding.lmodfolder (analogous to how we do this for EESSI), this will allow them to also have a proper cache for locally installed modules.An example would be: